Will walk you through these steps with a dataset
Find these slides @ https://bit.ly/2lyGAqr
Orignally a worm biologist, now bioinformatician @ Monash Bioinformatics Platform, more recently R-Ladies Melbourne organiser
This talk can be considered ‘Most Useful Things Worm Adele Would Have Liked to Have Known When Starting Out in R’
Programming language for statistical computing and graphics
Has lots of plotting functionality and well geared towards data analysis out of box with in-built statistical tests
Well developed ecosystem of software packages that further expands base R for analysis, project management, visualisation, document generation, etc
Continous active development
Thorough documentation
The marriage between Markdown, a lightweight markup language and R, a programming language for statistics
An R Markdown file is a plain text document that allows you to embed R code chunks + plain text notes & images & videos.
Structure:
An R Markdown file by itself is quite simple but is neatly rendered into a more complicated document type
*actually supports up to 52 language engines including Python, Julia, C++, MySQL, bash, etc
title: "Rmarkdown Quickstart"
author: "Adele Barugahare"
date: "27/08/2019"
output:
ioslides_presentation:
df_print: "paged"
html_document:
df_print: "paged"
toc: true
toc_depth: 2
pdf_document:
number_sections: true
df_print: "kable"
` ```{r, chunk_options}` `
#Code analysis goes here
x <- 1:10
y <- x * 2
plot(x, y)
etc
` ``` `
Github: aabarug
Repo: quickstart_rmarkdown_sept_2019
an opinionated collection of R packages designed for data science: ggplot2, dplyr, magrittr, tidyr, readr, etc
Tabular data is represented as data-frames - in-built class
Tidy data:
input ++ ++ ++ result
data %>% | verb | %>% | verb | %>% | verb | -> data
frame ++ ++ ++ frame
%>% - Pipe symbol that passes output from one function to another (Magrittr package)
*Dataset* > Manipulate to extract information > Plot > Communicate
Read in data:
## ── Attaching packages ───────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.0 ✔ purrr 0.3.2
## ✔ tibble 2.1.1 ✔ dplyr 0.8.1
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Parsed with column specification:
## cols(
## Route = col_character(),
## Departing_Port = col_character(),
## Arriving_Port = col_character(),
## Airline = col_character(),
## Month = col_double(),
## Sectors_Scheduled = col_double(),
## Sectors_Flown = col_double(),
## Cancellations = col_double(),
## Departures_On_Time = col_double(),
## Arrivals_On_Time = col_double(),
## Departures_Delayed = col_double(),
## Arrivals_Delayed = col_double(),
## Year = col_double(),
## Month_Num = col_double()
## )
The dataset:
Australian domestic airlines on time dataset with information from 2004 to 2019 - 80615 rows and 14 columns from http://data.gov.au/
Dataset > Manipulate to extract information > Plot > Communicate
Extract route ‘Adelaide-Brisbane’ in the year 2008 & fix up the month column
df2 <- df %>%
filter(Airline != "All Airlines", Route == "Adelaide-Brisbane", Year == 2008) %>%
mutate(Month = lubridate::month(x = Month_Num, label = T)) %>% head(30)Dataset > Manipulate to extract information > Plot > Communicate
Implements the grammar of graphics, a coherent system for describing and building graphs
Takes a data-frame input, describes which columns maps to which aethestics and then builds a plot by layering ‘geoms’.
ggplot(df, aes(x = column_A_df, y = column_B_df, color = column_C_df,
etc...)) +
geom_point() + geom_boxplot() + geom_etc() +
geom_line(new_data, aes(color = "blue")) +
theme( modifications_to_plot_appearance )Geoms define the type of plot the data should be displayed as.
The top level aethestics & data will be passed on to all geoms but can be overriden by specificing new data/aethestics to that specific geom
gp <- gp + geom_point(aes(y = Sectors_Flown), data =
filter(df2, Airline == "Qantas", Route == "Adelaide-Brisbane",
Year == 2008), color = "black")
gpdf2 %>% group_by(Airline) %>% plotly::plot_ly(x = ~Month,
y = ~Arrivals_On_Time, color=~Airline, type = 'scatter',
mode = 'lines')Or use it as a wrapper to a ggplot object with ggplotly
Develop interactive web applications/dashboards through the use of pre-built widgets.
Two parts to a shiny app:
ui <- fluidPage(
## Define the user interface here
## Layout of input and output widgets
)
server <- function(input, output) {
#Server code to process inputs from ui, manipulate data accordingly
# Then send back to UI to display an output
}
# Run the app
shinyApp(ui = ui, server = server)library(shiny)
# Define UI for app that draws a line plot -
ui <- fluidPage(
# Input: widget to select which airline to plot
selectInput(inputId = "select_airline", label = h3("Select airlines"),
choices = list("Jetstar", "Qantas", "Virgin Australia"),
selected = "Jetstar"),
# Output: Line plot -
plotOutput(outputId = "linePlot")
)# Define server logic required to draw a line plot -
server <- function(input, output) {
output$linePlot <- renderPlot({
# Read data in and filter based on which airline is selected from the widget
df <- df2 %>%
filter(Airline == input$select_airline)
# Generate ggplot
ggplot(df, aes(x = Month, y = Arrivals_On_Time,
group = Airline)) + geom_line()
})
}
# Run the application
shinyApp(ui = ui, server = server)##
## Listening on http://127.0.0.1:8443
More examples * Varistran demo app * Shiny Gallery
Dataset > Manipulate to extract information > Plot > Communicate
Supported Documents Outputs
…and more created by the R community
Packages that further build on top of R Markdown
You can also build Shiny apps into R Markdown documents
Document what you’ve done with your data in code
R Markdown can render multiple different document types from one Rmd file
The more places (files) an analysis is spread across, the more work it is to keep all of it accurate and up-to-date.
R Markdown allows you to focus on generating content & doing your analysis without (hopefully) spending too much time fighting your document itself*
*the more a document is geared towards a particular output type, the harder it is to neatly convert between formats